A File System Based Inverted Index ( LUT CS - TR 996 )

نویسندگان

  • Jon P. Knight
  • Martin Hamilton
چکیده

This paper documents the design and development of a simple le system based inverted index for use in rapid retrieval of information from resource templates in a Subject Based Information Gateway (SBIG) application on the World Wide Web (WWW). The index mechanism trades le system space for speed and ease of implementation. It makes use of the UNIX hierarchial lesystem structure to hold the index and provide rapid access to the index les with only a few disc accesses. Unlike DBM and the other simple UNIX based databases, this index has no limit on its bucket size, other than that imposed by the physical disc space available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLIP: A Compact, Load-balancing Index Placement Function

Existing file searching tools do not have the performance or accuracy that search engines have. This is especially a problem in large-scale distributed file systems, where better-performing file searching tools are much needed for enterprise-level systems. Search engines use inverted indices to store terms and other metadata. Although some desktop file searching tools use indices to store file ...

متن کامل

Efficient Processing of Category-Restricted Queries for Web Directories

We show that a cluster-skipping inverted index (CS-IIS) is a practical and efficient file structure to support category-restricted queries for searching Web directories. The query processing strategy with CS-IIS improves CPU time efficiency without imposing any limitations on the directory size.

متن کامل

An Inverted File Cache for Fast Information Retrieval

The inverted file is the most popular indexing mechanism used for document search in an information retrieval system (IRS). However, the disk I/O for accessing the inverted file becomes a bottleneck in an IRS. To avoid using the disk I/O, we propose a caching mechanism for accessing the inverted file, called the inverted file cache (IF cache). In this cache, a proposed hashing scheme using a li...

متن کامل

Block-Ranking: Content Similarity Retrieval Based on Data Partition in Network Storage Environment

Nowadays, data partition plays an important role in eliminating duplicate data in green storage and cloud storage system. Fixed-sized chunking and content based chunking are two kinds of commonly used partition methods to break a file into a sequence of blocks. Meanwhile, inverted index has become the standard indexing method in modern information retrieval field. For conveniently analyzing, th...

متن کامل

S-Index: a Hybrid Structure for Text Retrieval

Today, two classes of indexing methods enjoying wide applicability are the Inverted Index and the Superimposed Coding based Signature File (SC-SF). The former is most efficient in query processing but utilizes extra storage of size comparable to that of the textbase, whereas the latter is most efficient in storage utilization. The present study builds upon the results obtained in previous resea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007